AUC-RF: a new strategy for genomic profiling with random forest.
نویسندگان
چکیده
OBJECTIVE Genomic profiling, the use of genetic variants at multiple loci simultaneously for the prediction of disease risk, requires the selection of a set of genetic variants that best predicts disease status. The goal of this work was to provide a new selection algorithm for genomic profiling. METHODS We propose a new algorithm for genomic profiling based on optimizing the area under the receiver operating characteristic curve (AUC) of the random forest (RF). The proposed strategy implements a backward elimination process based on the initial ranking of variables. RESULTS AND CONCLUSIONS We demonstrate the advantage of using the AUC instead of the classification error as a measure of predictive accuracy of RF. In particular, we show that the use of the classification error is especially inappropriate when dealing with unbalanced data sets. The new procedure for variable selection and prediction, namely AUC-RF, is illustrated with data from a bladder cancer study and also with simulated data. The algorithm is publicly available as an R package, named AUCRF, at http://cran.r-project.org/.
منابع مشابه
AUC-RF: A New Strategy for Genomic Profiling with Random Forest
Objective: Genomic profiling, the use of genetic variants at multiple loci simultaneously for the prediction of disease risk, requires the selection of a set of genetic variants that best predicts disease status. The goal of this work was to provide a new selection algorithm for genomic profiling. Methods: We propose a new algorithm for genomic profiling based on optimizing the area under the r...
متن کاملAUC-RF: A New Strategy for Genomic Profiling with Random Forest
Objective: Genomic profiling, the use of genetic variants at multiple loci simultaneously for the prediction of disease risk, requires the selection of a set of genetic variants that best predicts disease status. The goal of this work was to provide a new selection algorithm for genomic profiling. Methods: We propose a new algorithm for genomic profiling based on optimizing the area under the r...
متن کاملAUC-RF: A New Strategy for Genomic Profiling with Random Forest
Objective: Genomic profiling, the use of genetic variants at multiple loci simultaneously for the prediction of disease risk, requires the selection of a set of genetic variants that best predicts disease status. The goal of this work was to provide a new selection algorithm for genomic profiling. Methods: We propose a new algorithm for genomic profiling based on optimizing the area under the r...
متن کاملارزیابی صحت پیشبینی ژنومی در معماریهای مختلف ژنومی صفات کمی و آستانهای با جانهی دادههای ژنومی شبیهسازیشده، توسط روش جنگل تصادفی
Genomic selection is a promising challenge for discovering genetic variants influencing quantitative and threshold traits for improving the genetic gain and accuracy of genomic prediction in animal breeding. Since a proportion of genotypes are generally uncalled, therefore, prediction of genomic accuracy requires imputation of missing genotypes. The objectives of this study were (1) to quantify...
متن کاملمقایسه مدلهای جنگل تصادفی (RF) و درخت رگرسیون تقویت شده (BRT) در پیشبینی حضور گونههای غالب مرتعی در مراتع پلور، مازندران
For this study, Polour rangelands were chosen with an area of about 2017 ha in Mazandaran province. The purpose of this study was prediction of dominant species of the rangeland using random forest (RF) and boosting regression trees (BRT) models in the study area. Equal random sampling of vegetation and soil was carried out. 12 work units were obtained in the region that climatic, topography an...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Human heredity
دوره 72 2 شماره
صفحات -
تاریخ انتشار 2011